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ABSTRACT 

In the 1983-84 school year, the Michigan State Board 
of Education conducted a survey of early childhood programs in all of 
the staters school districts. A total of 518 districts, or 93 
percent, responded. Of these, 161 districts indicated that they had a 
2-year developmental program for children who were old enough to 
enter kindergarten but were judged not ready for the regular 
kindergarten program* Schools with a readiness kindergarten program 
were asked what type of screening instrument they used. All tests 
used test reviews from the Seventh through Ninth Mental Measurements 
Yearbooks. Tests were examined for representative norming samples, 
validity data, and reliability data. Many were found to be deficient 
in one or mor^ areas and many were found to be inapplicable or 
inappropriate. None of the screening instruments used by districts in 
1984 met criteria of representative sampling, reliability, and 
validity. It is concluded that placements of young children into 
2-year developmental readiness programs should be made with great 
caution. Given the lack of statistical data, screenings that result 
in indications of deficiencies should be followed by extensive 
examinations. (RH) 
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Abstract 

In 1984 the Michigan Department of Education con- 
ducted a survey of all school districts. One hundred 
Sixty-one school districts responded that they had a 2- 
year developmental program for children age 5 by 
December 1 labeled readiness kindergarten. These 
districts also responded with the types of screening 
for reidlness testing instruments that were used for 
placemen c. Using test reviews from the j3.i..Y..e.]ii.th through 
Mliit.li..„.H,?atal.„.M..e.afi.ur.e.m.ents....l.e.^^^^^^^ all tests were 
examined for representative norming samples, validity 
data, and reliability data. Many were found deficient 
in one or more areas . Many were found to be 
inapplicable or inappropriate. 
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In the 1983-84 school year the Michigan State 
Board of Education (1984) conducted a survey of early 
childhood programs in all school districts. The return 
rate for the survey was 93% (n=518). "Readiness 
Kindergarten" was defined as a program designed for 
those children who are five by December 1, but who are 
determined "not ready" for the regular kindergarten 
program, schools having a readiness kindergarten 
program were asked what type of screening instrument 
was used for their program. A ranked listing of 
responses is presented in Table 1. 



Insert Table 1 about here 



In order for educational decisions to be informed 
and appropriate, certain measurement standards need to 
be met. The test needs to be valid, reliable, and 
applicable. A case for validity can be made based upon 
the content as it relates to professional theory, 
research, or literature. However, when one uses a 
screening instrument for placement, a case should also 
be made for its predictive accuracy. This quality of 
predictive validity has been labeled criterion-related 
validity. In order to determine reliability a test 
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maker may use any of the following statistical tech- 
niques: test-retest, alternate forms, split-half, and 
measures of homogeneity. A test cannot be valid if it 
is not reliable, in choosing a test for screening, one 
must determine if it is applicable to the sample for 
which it is intended, since all school districts are 
different, test makers often will pilot a test with a 
sample representative of various population charac- 
teristics. This norming process establishes a test's 
external, or population validity. To give a test with 
norms based on a sample to another totally different 
sample with different characteristics and then apply 
the same norms would b nappropriate and raise doubts 
about population validity. (Thorndike and Hagen, 1977; 
Ary, Jacobs, Razavich, 1985; Kerlinger, 1986; Isaac and 
Michael, 1987). 

A search was made in the ^.e.Y.s.ntk,.....i;.ig.h.t.k.. 

..N.iJi.t.k....M.eji.t..aI...H.e..as.ux.e.n? le..axte.Q..Q.k.s ( Euros , 1972; euros , 

1978; p.iid Mitchell, 1985) for all test reviews regard- 
ing these screening instruments to determine whether 
professional reviewers had determined whether these 
instruments met standards for validity, reliability, 
and population validity. What follows is a summary of 
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test reviews found. A summary of findings is presented 
in Table 2. 



Insert Table 2 about here 



<3es..e.ll S..c..li.Q..Qi Te.s.,t 

The Gesell Test is based on years of clinical 
experience and theory by the Gesell Institute. 
However, its major limitations are its absence of data 
on reliability of any form/ only one validation study 
relating scores to teachers' ratings of performance; 
lack of cutoff scores to demonstrate discriminant 
validity/ and 1928 norms based only on one group of 
white, middle-class, New England students. (Bradley, 
1985; Waters, 1985). In a study using the Gesell Test 
to predict later diagnosis for special needs, there was 
a 21% error rate; error rate increased when cutoff 
scores were lowered; one-half of those determined ready 
for kindergarten did not have a successful kindergarten 
experience; and although there was a significant 
difference to success or failure, the difference only 
accounted for 22% of the variance in the criterion 
measure. (Wood, Powell, and Knight, 1984). Another 
study, although finding a reliability coefficient of 
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.84, found that the unsystematic clinical method to 
score the test created such a large error of measure- 
ment that a 4.5 developmental age could not be readily 
distinguished from a 5.0 developmental age. (Kaufman 
and Kaufman, 1972). The same study found a correlation 
between the Gesell Test and first grade Stanford 
Achievement Tests to be .64. shephard and Smith (1986) 
in their review of the Gesell Test concluded that it 
did not meet the standards of the American 
Psychological Association for validity, reliability, or 
normative information. 

•AB-C ln./..«.n.t..Qx.Y .fc.Q. D.s.t..«XMia«.....E.indaii:g.aj:t.«u...ma.....S..c.kQ.^^^ 

A review of the ABC inventory found a claimed 
criterion validity coefficient with the Stanf ord-Binet 
of .78. However, the sample for this study was very 
small (n=14). No information is presented on the 
characteristics of the norming sample. While a 
predictive validity study was conducted for future 
success in kindergarten with a resultant .70 correla- 
tion, the test correctly identified 86% who failed, but 
failed to correctly idf.ntify 37% who passed. The 
predictive study was also questioned due to the fact 
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that the kindergarten class studied had a 26% failure 
rate. (Weikart, 1972). 

..DIIVL...... D.e..Y§.l.QPiie.xi,.ta.X lQ.41c..ai..t;.Q.t:.§. t.Qi.. .the. i!\.s.s.§..ssffl.ej;i.t;. .Q.f 

DIAL was intended as a screening instrument to 
determine if a preschooler needed further evaluation, 
not for placement. DIAL was normed using a stratified 
sample located only in Illinois with an overrepresenta- 
tion of blacks and low SES children. Reviewers found 
suffj-cient evidence for content validity. Criterion 
validity to I.Q. and mental age scores are made by the 
test authors but no correlations are presented. A 
predictive validity study was done between DIAL scores 
and standardized tests after two years. Predictive 
validity coefficients ranged from .45 to .73 with a 
median of .56. No reliability data is presented. 
(Grill, 1978; McCarthy, 1978). 

The Brigance Inventory is a criterion-referenced 
measure. The measure obtains content validity from 
field testing and an extensive review of literature. 
No predictive or criterion validity is presented. No 
reliability data is presented. (Saigh, 1985). In 1982 
the same author developed the Brigance K and 1 .Screen 
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for Kindergarten and First Grade. The Screen is to be 
used to rank children tc their local reference group, 
but no information is provided to make placement 
decisions. Once again, no validity or reliability 
statistics ere provided. (Boehm, 1985). 

.lo..calIy....D.e..v:.e.lQP.e..a Qbj,e..g.fc.ly..e.....Es.f.e.r.e.ia.a.e...Xe.,5l,.5 

One can only assume that locally developed tests 
may have some reason to include certain tests items for 
content validity. An assumption is made that these are 
criterion-referenced tests which are intended to 
measure mastery of skills. Each local district would 
have to answer the question as to whether any 
predictive or criterion validation or reliability 
studies have been conducted and what norming has 
occurred to make comparative judgments. 

No reference was found to a screening device or 
test titled or authored by Lesiak in .§.e.Y.S.at.h through 
.Mi.n..t.h....E.d.i.U..Q.i;s M.en.tal M.g!.aj5..ur..e.ffl..e.at,5. l,e.a.r.b..QQk 

(1972-1985) . 

Caldwell 

No reference was found to a screening dev;'.ce or 
test titled or authored by Caldwell in the ..S,g.Y.e,n..th 



9 



* * 

Developmental Screening 9 



t h r ou gh .Nln.tli....E.cli.t.i.cns .gf ......tke M.e.at..al M.e.as.ur.e.ni.e n .t.s 

.Y.S..aikQ.o]^ (1972-1985). 

.8.e..ex.y 

No revj.ew was made of the Beery Developmental Test 

of Visual Motor in the ..S.s.Y..s.at.b through .Nillttl Mi.ti.Q.as 

.Q.L...tlllS....MS.Il.t..al K^..a5..ULemS.n..t..S.....XS..S.£b..Q.Ql> (1972-1985). 

The test was designed so that a composite score 
could not be obtained to discourage deciding whether a 
child is or is not ready to enter kindergarten. The 
norming sample had a disproportionate percentage of 
low-income and Caucasian children. A case for content 
validity is made based upon similar tests and preschool 
and kindergarten curricula. While the test-retest 
reliability was .90, the correlations for split-half 
ranged from .04 to .93 with only 4 of 12 subtests 
adequately able to distinguish between high and low 
performing children. A predictive validity study was 
conducted to scores on the Metropolitan Readiness Test 
Scores. Correlations ranged from .20 to .62. (Gray, 
1985; White, 1985) . 
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^^Q^ Bx.emi.e.r. D..e..Y„e,i,Q,Em.e.at..a.l .Q.e..^.,t..fikl..t. .T,e.,s,.t, , o f a.clj.Q..Q..X 

.Rfe.fii.dln.e..s..s 

The norms for this test were developed from 750 
kindergarten and first grade children in Mt. Clemens, 
Michigan. No descriptive data is presented to deter- 
mine the representativeness of the sample. No predic- 
tive validity is presented. Criterion validity was 
determined by comparisons to Metropolitan Readiness 
Tests which resulted in correlations between .61 and 
.81. Reliability was determined uamg test-retest 
(.55-. 74) and split-half (.83-. 92). The reviewer did 
not recommend the use of this test by teachers due to 
the ambiguity .in the manual regarding interpretation. 
(Deloria, 1972), 

BQ^lira..-Te.s..t. of B.,asl.Q...CQn.cep..ts 

The purpose of the test is to assess students' 
knowledge of frequently used basic concepts. A case is 
made for content validity based upon a content analysis 
of curriculum materials and pilot testing. The 
standardize ;ion sample was geographically representa- 
tive. No ethnic representation figures are presented 
and a disproportionately large number of lower socio- 
economic students were included. Reliability 
coefficients ranged from .68 to .90. Other than 
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content validity, no further types of validity are 
reported. (McCandless, 1972; Smock, 1972). 

,M.e..trppQi.i,tfii.ii M,lil,e.¥.,§.Bi,e,j;i.t;. .T.e..s..t;......tlM!M,.) 

The MAT includes both achievement and criterion- 
referenced tests for K.O through 9.9. Item response 
theory and curriculum analysis was used to establish 
content validity and discriminate ability of test 
items. The norms were established from a national and 
representative sample. Both split-half and tests of 
homogeneity are reported with correlations at or above 
.80. Criterion validity was established with the Otis- 
Lennon School Ability Test. No predictive validity is 
reported. (Haertel, 1985; Linn, 1985). The 
Metropolitan Readiness Test (MRT) was designed as an 
assessment to determine readiness for reading. As the 
MAT, the MRT has extensive norming, reliability, and 
content validity data. Predictive validity tests with 
later achievement tests resulted in a .60 correlation 
with future reading achievement. The test was not 
intended to be diagnostic of specific deficiencies or 
disabilities. (Ravitch, 1985). 
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Eeib.Q.ay.,..„l!l.g.t.jii:.e. V.o.c.ato.ulary......T.es.t 

The response on the original survey was for the 
Peabody PFS. No such test was found in the ae.y.e.n.t.h 

through ..N..3liit.h...„M.e.Ji.t§l .M.e.a.5..ux..eM.e.ii.t.s Y,.e..axJb..Q5.fe; . what 

follows is a summary of reviews of the Peabody Picture 
Vocabulary Test revised (PPVT-R). The PPVT-R was 
designed as a measure of receptive language. The 
norming sample was national and representative. 
Reliability coefficients for homogeneity, test-retest, 
and split-half reliability range from .61 to .91. 
Comparing ppVT-R scores to I.Q. measures and ability 
tests have resulted in correlations ranging from .16 to 
.78 for criterion validity. No predictive validity 
statistics were available. (Mccallum, 1985; Wiig, 
1985) . 

C.aliiQX.nia,...£sy.c?hQl.Q.gi.c.al..Jny^^^ {..CPU. 

The CPI was intended as an assessment of inter- 
personal behavior or social interaction for children 
ages 13 and over. There is a question as to the dis- 
criminate ability of the subtests. Criterion validity 
tests resulted in correlations ranging from .2 to .5. 
Despite its popularity, little information and little 
research is provided to interpret results. No 
information is provided on reliability. No predictive 
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validity information is provided. (Baucom, 1985; 
Eysenck, 1985). 

DAB.ERD.N.; A S.c;.i:.e..eBi»g....J?..e.y..x.c..e. iQi: ;5..g.h.Q..Q.I....E.e..ai.diin.ei5.s 

DABERON was referred to and described in TJb,3. UlRth 

M.entaI...HeASMr.sm.enjt;.s. Y..e..SXfe.Q..Q.k . no review was found in 

the seventh through ninth editions. 

DaIIa£i 

No review was found in Tlie....S..ex«.at.h through Hiath 

. . i^M. ■ > .^^i t^^. M « M t .^?tiSi ^flh <^^M^^ £K M » 

The Eliot-Pearson Screening Inventory was referred 
to and described in ..TJbLe....Matk..Kfintal.™Ms.as.urme.ats 
.IjS..a3:.fc>.Q:Q.H . No review was found in the seventh through 
ninth editions. 

The Frostig was developed as an assessment of 
sensory-motor and movement skills for students ages 6 
through 12. The standardization sample consisted only 
of Caucasian children from one school district in 
California. Content validity is claimed based upon 
theory and research studies with adolescents and 
adults. No criterion-related or predictive validity is 
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presented. Correlations for reliability using common 
factor variance range from .44 to .38 with a median 
coefficient of .60. According to OaJdand (1985) these 
correlations were too low to make judgments about 
placement. Rosen (1985) concluded his review that 
inadequacies in the test made it unacceptable in its 
present state. (Oakland, 1985; Rosen, 1985). 

.j>'ii i^^jj^ ^Cf, .^is I .| I ■ i^^ii^. ui^t^Kt T^^t i^ifi .fid 

No review found in the .S.,eLV..e.n.th through Ninth 

EenLal Me.as.U.r.em.e.n..ts I.S.ai:.fe.Q..ol.s . The only reference to 

"Haptic" was the Haptic Intelligence Scale for the 
adult blind. 

MlIlex_..As.6.es^«ieut; lQX..„£re.s.c;liQ,Q.l.exj5 (MAE). 

The MAP was designed as a screening test for 
identifying children who exhibit moderate "preacademic 
problems". A case for content validity is made based 
upon preschool tests, research, theory, and pilot 
studies. The norming sample was a stratified national 
sample. Studies comparing the MAP to WPPSI and ITPA 
resulted in correlations of .27 and .31 respectively. 
Reliability correlations were .98 for interrater 
reliability and .79 for tests of homogeneity. No 
predictive validity statistics are presented. 
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(Deloria, 1985; Michael, 1985). A follow-up study was 
done by the test author of the original sample four 
years after initial screening. While predictive 
correlations were iiot presented, a significant 
difference (p<,01) was found between those identified 
as deficient in the areas of retentions, teacher 
observations, special L'ervices, and below average 
report cards (Miller, 1988). 

No such test was found. The assumption is made 
that this response was inaccurate and probably meant 
the Miller Assessment for Preschooler (see above). 

The MCDI was designed as a supplement to a parent 
interview in order to identify children with below 
average developmental abilities through parent 
experiences with the child. The norms are based upon a 
sample of white, middle-class, intact families. Split- 
half reliability coefficients were derived from the 
sample with a median correlation of .79. No validity 
data is present, 
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The only test referred to with the name 
"Zimmerman" in the .asYSMh through Miii.th.....Me,iita.X 
.M!e..aS-UXm§.at.S...„XeM:Jt2.Q.$2.]^;S was the Zimmerman-Sanders Social 
Studies Test intended for grades 7 and 8. 

DXSQUBJSlQJX 

In order for decisions to be accurate and informed 
using test data, the test must be valid, reliable, and 
based on a representative norming sample. A 
reexamination of Table 2 will show that none of the 
screening instruments reportedly used in the 1984 
Michigan survey meet all of the criteria. A test 
should not be used to identify and place children 
without reliability and validity data (Meisels, 1987). 
Shephard and Smith (1986) in their examination of tests 
in use at the kindergarten level concluded that none of 
the existing tests are accurate enough to justify 
removing children from their normal peer group and 
placing them in two-year programs, A justification for 
placement may be made if predictive validity data 
indicates potential problems. However, as was shown, 
few tests have predictive data and even those that do 
have minimal data. 
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The use of many of the tests reported for screen- 
ing and placement is inappropriate or inapplicable. 
Achievement and criterion-referenced tests measure 
current abilities, skills, or achievement and do not 
presume how much a student could or could not learn in 
regular or readiness kindergartens.. Many simply were 
not designed for the purpose of screening and placement 
(e.g. CPI) . Meisels (1987) argues that readiness tests 
are only to be used to assess current abilities and to 
facilitate curriculum planning. The National Associa- 
tion for the Education of Young Children would appear 
to agree when they took the following position: "It is 
the responsibility of the educational system to adjust 
to the developmental needs and levels of the children 
it serves . . . ." (1985, p. 16). 

Placements into two-year programs should be made 
with the gi 3atest of caution. Given the lack of 
statistical data, screenings resulting in indications 
of deficiencies should be followed by more extensive 
examinations as intended by many test authors. 
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Table 1 

RanH,e.(i...EeiSponses for...Sgx..e.ewin.g:....Ias.txjAmen,ts^ 

..Re.a.(3iiiii.e..€;.s. K.ln.der.aar..t;..e.i» 



Instrument 



Frequency of 
Districts 
Reporting Use 



Gesell 

ABC 

DIAL 

Brigance Diagnostic 

Locally developed objective reference test 

Lesiak 

Caldwell 

Beery 

Deu-Task of K-R 

Anton Brenner, Brenner Gestalt 

Boehm Slater 

MAT 

Peabody 
CP I 

Daberon 
Dallas 

Elliot-Pearson 
Frostig 

Haptic Perception 
MAP 

Miller Preschool Assessment 

Minnesota 

Zimmerman 



48 
19 
16 
11 
9 
6 
5 
3 
3 
3 
2 

2 
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Table 2 



Xfta.t..ReYiewa....oi..Jlea.diBt,3s.„KIndex8a S.Qr.e.eajLn8..Jas.t;rM«i(}a.t;js la..J(l.eni;aJ- 



Validity 



Reliabilitv Norms Ccmment 



Name 



Content Predictive Criterion 



Gesell + 

ABC U 

DIAL + 

Brigance 0 

Locally Developed ? 

Les iak U 

Caldwell U 

Beery U 

Deu-Task + 

Brenner 0 

Boehin + 

MAT + 

Peabody + 

CPI 0 

Daberon U 

Dallas U 

Eliot-Person U 



.64? 

.70? 

0 
0 



u 
u 
u 

0 
0 
0 
0 
0 
0 

u 
u 
u 



22% 

.78 

.56IQ 
0 

? 

u 
u 
u 

.20-. 62 
.66-. 75 
0 

+I.Q. 
.16-. 78 
.2-. 5 

U 
U 

u 



.8t? 



0 



? 
u 
u 
u 

.90 
.54-. 92 
.68-, 90 
.75-. 90 
,52-. 91 



U 

u 
u 



21% or 
greater 
error rate 

37% false 
negatives 



criterion- 
referenced 



U no review 



U no review 



U no review 



? for ages 
13 & over 

U no review 

U no review 

U no review 
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Table 2 (Continued) 



Validity 



Reliability Norms Conmient 



Name 



Content Predictive Criterion 



Pros tig 

Haptic 
MAP 

Minnesota 
Zitmnerman 



U 
+ 

0 

u 



u 

0 
0 

u 



u 

.27-. 31 
0 

u 



.60 

u 

.79-. 98 
.79 
U 



? judged 
unaccept- 
able 

U no review 



U no review 



+ = present in reviews 

? = in question or doubtful 

0 = not present by test author or reviewers 

U = unknown 
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